6 research outputs found

    Voice biometric system security: Design and analysis of countermeasures for replay attacks.

    Get PDF
    PhD ThesisVoice biometric systems use automatic speaker veri cation (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoo ng attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoo ng attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount | yet di cult to detect reliably and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The rst part of the thesis investigates existing methods for spoo ng detection from several perspectives. I rst study the generalisability of hand-crafted features for replay detection that show promising results on synthetic speech detection. I nd, however, that it is di cult to achieve similar levels of performance due to the acoustically di erent problem under investigation. In addition, I show how class-dependent cues in a benchmark dataset (ASVspoof 2017) can lead to the manipulation of class predictions. I then analyse the performance of several countermeasure models under varied replay attack conditions. I nd that it is di cult to account for the e ects of various factors in a replay attack: acoustic environment, playback device and recording device, and their interactions. Subsequently, I developed and studied a convolutional neural network (CNN) model that demonstrates comparable performance to the one that ranked rst in the ASVspoof 2017 challenge. Here, the experiment analyses what the CNN has learned for replay detection using a method from interpretable machine learning. The ndings suggest that the model highly attends at the rst few milliseconds of test recordings in order to make predictions. Then, I perform an in-depth analysis of a benchmark dataset (ASVspoof 2017) for spoo ng detection and demonstrate that any machine learning countermeasure model can still exploit the artefacts I identi ed in this dataset. The second part of the thesis studies the design of countermeasures for ASV, focusing on model robustness and avoiding dataset biases. First, I proposed an ensemble model combining shallow and deep machine learning methods for spoo ng detection, and then demonstrate its e ectiveness on the latest benchmark datasets (ASVspoof 2019). Next, I proposed the use of speech endpoint detection for reliable and robust model predictions on the ASVspoof 2017 dataset. For this, I created a publicly available collection of hand-annotations of speech endpoints for the same dataset, and new benchmark results for both frame-based and utterance-based countermeasures are also developed. I then proposed spectral subband modelling using CNNs for replay detection. My results indicate that models that learn subband-speci c information substantially outperform models trained on complete spectrograms. Finally, I proposed to use variational autoencoders | deep unsupervised generative models | as an alternative backend for spoo ng detection and demonstrate encouraging results when compared with the traditional Gaussian mixture mode

    Data Quality as Predictor of Voice Anti-Spoofing Generalization

    No full text
    International audienceVoice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora)-and we do not know why. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within-and betweendomain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through xvector speaker embeddings), signal-to-noise ratio, and selected voice quality features
    corecore